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Abstract 

Let G — G„ t k denote the graph formed by placing points in a square 
of area n according to a Poisson process of density 1 and joining each 
point to its k nearest neighbours. In [2] Balister, Bollobas, Sarkar and 
Walters proved that if k < 0.3043 log n then the probability that G is 
connected tends to 0, whereas if k > 0.5139 log n then the probability 
that G is connected tends to 1. 

We prove that, around the threshold for connectivity, all vertices 
near the boundary of the square are part of the (unique) giant compo- 
nent. This shows that arguments about the connectivity of G do not 
need to consider 'boundary' effects. 

We also improve the upper bound for the threshold for connectivity 
of G to k = 0.4125 log n. 

1 Introduction 

Let S n denote a \fn x y/n square and let G n ^ denote the graph formed 
by placing points in S n according to a Poisson process V of density 1 and 
joining each point to its ^-nearest neighbours by an undirected edge. Since 
we shall be interested in the asymptotic behaviour of this graph as n — > oo, 
it is convenient to introduce one piece of notation. For a graph property 
n we say that G n ^ has II with high probability (abbreviated to whp) if 
P(G n: k has LT) — > 1 as n — > oo. 

Xue and Kumar [5] proved that the threshold for connectivity is ©(log n); 
more precisely they showed that if k = k(n) > 5.1774 log n then G n ^ is con- 
nected whp, and if k = k{n) < 0.074 log n then G n ^ is whp not connected. 

Subsequent work by Balister, Bollobas, Sarkar and Walters [2] substan- 
tially improved the upper and lower bounds to 0.5139 log n and 0.3043 log n 
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respectively. In their proof they also showed that for any k = 0(logn) 
the graph consists of a giant component containing a proportion 1 — o(l) 
of all vertices and (possibly) some other 'small' components of (Euclidean) 
diameter 0(\/log n) (for a formal statement see Lemma [3]). 

Moreover, they showed that if k > 0.311 log n then G has no small com- 
ponent within distance 0(y/\og n) of the boundary of S n . Unfortunately, 
there is a gap between this bound and the lower bound of 0.3043 men- 
tioned above. This means that close to the threshold for connectivity the 
obstruction to connectivity could occur near the boundary of the square or 
it could occur in the centre (their methods did rule out the possibility that 
the obstruction occurs in the corner of the square) . This has caused several 
problems in later papers (e.g., [3]) where the authors had to consider both 
cases in their proofs. 

Our main result is the following theorem showing that, in fact, the ob- 
struction must occur away from the boundary of S n . This should simplify 
subsequent work in the only central components need to be consid- 

ered. (Of course, the improvement itself is only of minor interest, it is the 
fact that the new upper bound for the existence of components near the 
boundary is smaller than the general lower bound that is of importance.) 

Theorem 1. Suppose that G = G n ^ for some k > 0.272 log n. Then there 
is a constant e > such that the probability that there exists a vertex within 
distance log n of the boundary of S n that is not contained in the giant com- 
ponent is 0(n~ £ ). 

Remark. The distance log n to the boundary is much larger than the typical 
edge length and (non-giant) component sizes which are 0(-\/logn). More- 
over, the theorem would still be true with logn replaced by a small power 
of n. 

Our second result is the following improvement on the upper bound for 
connectivity of G. 

Theorem 2. Suppose that G = G n ^ for some k > 0.4125 logn. Then whp 
G is connected. 

To illustrate Theorem [2] let D be a disc of radius r and consider the 
event that there are k + 1 points inside D and no points in 3-D \ D (where 
3-D denotes the disc with same centre as D and three times the radius). 
If this event occurs then the fc-nearest neighbours of any point in D also 
lie in D: in particular, there are no 'out'-edges from D to the rest of the 
graph. If we choose r such that 97rr 2 ~ k + 1 (to maximise the probability of 
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this event) then the probability of a specific instance of this event is about 
g— (fe+i)_ g mce we can fit ®(n/ logn) disjoint copies of this event into S n we 
see that if k < (j^g — e) logn (for some e > 0) then whp this event occurs 
somewhere in S n and thus that G has a subgraph with no out-degree. Since 
l/log9 ~ 0.455 > 0.4125, Theorem [2] shows that there is a range of k for 
which the graph is connected whp but contains pieces with no outdegree. 
(The corresponding result for in-degree was proved in [2].) 

The proofs of these two theorems are broadly similar: they use the ideas 
from [2] but also consider points which are near the small component but 
not contained in it. Indeed, if one looks at the lower bound proved in [2] 
we see that the density of points near the small component is higher than 
average. This is an unlikely event and we incorporate it into our bounds. 
Indeed, the above observation that there are small pieces of the graph with 
no out-degree shows that any proof of Theorem [2] (or any stronger bound) 
must consider points outside of a potential small component and show that 
they send edges in. 

The key step is to split into two regimes depending on whether there is 
a point 'close' to the small component. If there is no such point then the 
'excluded area' from the small component is quite large (which is unlikely), 
whereas if there is such a point then it must have a small fc-nearest neighbour 
radius (which is also unlikely). 

2 Notation and Preliminaries 

We start with some notation. For any point x and real number r let D(x, r) 
denote the closed disc of radius r about x. We shall also use the term 
half- disc of radius r based at x to mean one of the four regions obtained by 
dividing the disc D(x,r) in half vertically or horizontally. 

For a set A in S n let \A\ denote the measure of A, and #A denote the 
number of points of V in A. For any real number r let Ai r \ be the r-blowup 
of A defined by 

A( r ) = {x £ R 2 : d(x,A) < r}. 

Note that we do allow A/ r \ to contain points outside of S n . 

Finally, whenever we use the term diameter we shall always mean the 
Euclidean diameter: we do not use graph diameter at any point in the paper. 

We shall need a few results from the paper of Balister, Bollobas, Sarkar 
and Walters [2]. Since our notation is slightly different we quote them here 
for convenience. The first is a slight variant of Lemma 6 of [2] which follows 
immediately from the proof given there (see also Lemma 1 of [3]). 
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Lemma 3. For fixed c > and L, there exists c\ = ci(c, L) > 0, depending 
only on c and L, such that for any k > clogn, the probability that G n ^ 
contains two components each of {Euclidean) diameter at least c\\/\og n, is 
0{n~ L ). 

The second bounds the probability of a small component near one side, 
or two sides of S n ; it is explicit in the proof of Theorem 7 of [2]. (Note, 
Theorem [1] improves the first of these bounds.) 

Lemma 4. Suppose that k = 0(logn). The probability that there is a 
small component containing a vertex within log n of one boundary of S n is 
0(n2+°W5~ fc ) and the probability that there is a small component containing 
a vertex within logn of two sides of S n is 0(n°^^~ k ). 

The final result follows easily from concentration results for the Poisson 
distribution (see e.g. [I]) and most of it is implicit in Lemma 2 of [2]. 

Lemma 5. For any fixed c and L there is a constant 02(0, L) such that 
for any k with clogn < k < logn the probability that there is any edge of 
length at least C2\/logn, or any two points within distance ^-\/logre of each 
other not joined by an edge, or a point x £ V with a half-disc of radius 
c 2 v^og n based at x contained entirely inside S n that contains no points of 
V, is 0(n~ L ). 

We will use the following simple but technical lemma several times. 

Lemma 6. Suppose that A,B,C are three sets in S n with \A\ < \C\ and 
\B\ < \C\ then 

F(#A >k,#B> k, #(A HB) = and #C = 0) < 

Proof Let A' = (A \ B) \ C , B' = (B \ A) \ C , C" = C U (A D B), and 
U = AUBUC = A'LIB'LIC. We see that A',B' and C are pairwise 
disjoint so \U\ = \A'\ + \B'\ + \C'\ and, since #(A n B) = 0, that #A' > k, 
#B' > k. We have 

P(#A > k, #B> k, #(A nB) = and #C = 0) 
= F(#A' > k, #B' > k and #C" = 0) 

= ^ F(#A' = I, #B' = m and #U = I + m) 

l>k,m>k 

= Y = h #B' = m\#U = l + m)P(#C7 = l + m) 

l>k,m>k 

< max W(#A' = I, #B' = m\ #U = I + m) 

l>k,m>k 



MA\\B\ \ 
(\A\ + \B\ + \C\f 
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(the final line follows since Yli>k m>k ^{H 1 ^ = I + m) < 1). 

We have \A'\ < \A\ < \C\ <\C'\so \A'\ < \\U\ and similarly B' < \\U\. 
Hence, for l,m>k, 



l#A' = /, #B' = m\#U = l + m) 



7 I \ / I \ ' /I n/| \ ™ 

I + m \ ( \A \ \ ( \B 



< 2 l+1 



I J\W\) \\U\ 

\A'\\ l / \B'^ m 



l+m 



< 2 2k 



\U\J \\U\ 
A'\\ k /\B nx k 



\U\J \\U\ 
4\A'\\B' 



\U\ 2 

4\A'\\B'\ 



{\A'\ + \B'\ + \o\y 



Finally, observe that \A'\ < \A\ < \C\ < \C'\ and \B'\ < \B\ < \C\ < \C'\ 
imply that 

4\A'\\B'\ MA\\B'\ 



(\ A '\ + \B'\ + \C'\f ~ {\A\ + \B'\ + \C'\f 

mm 

(lAI + l^l + IC'l) 2 
4\A\\B\ 



< 



< 



(\A\ + \B\ + \C\r- 

which completes the proof. □ 



3 Proof of Theorem [2] 

By hypothesis we have k > 0.4125 log n. Also, we may assume that k < 
0.6 log n since we already know that G n ^ is connected whp if k > 0.6 log n. 
Let d = max{ci(0.25, 1), C2(0.25, 1), 1} be as given by Lemmas [3] and [5] and 
let M = 20000c'. (We shall reuse some of the bounds we prove here in 
the proof of Theorem Q] so these are convenient values.) Tile S n with small 
squares of side length s = \/log n/M. We form a graph G on these tiles 
by joining two tiles whenever the distance between their centres is at most 
2d \f\ogn. We call a pointset V bad if any of the following hold: 

1. there exist two points that are joined in G but the tiles containing 
these points are not joined in G, 
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2. there exist two points, at most distance 20000s apart, that are not 
joined, 

3. there exists a half-disc based at a point of V of radius dyfkogn that is 
contained entirely in S n and contains no (other) point of l-' , 

4. there exist two components in G n ^ with Euclidean diameter at least 
cVlogn, 

5. there exists a component of diameter at most c'-y/log n containing a 
vertex within distance 2d \/log n of the boundary of S n , 

and good otherwise. We see that our choice of d and M together with 
Lemma [5] imply that the probability that any of the first three conditions 
occur is 0(n _1 ). By Lemma [3] the probability of the fourth condition is 
0(n~ 1 ). Since k > 0.4125 log n > lo ^ 25 log n, Lemma [J] implies the proba- 
bility of the last condition is 0(n~ £ ) for some < e < 1. (Alternatively this 
follows from Theorem [I]) . Combining these we see that the probability of a 
bad configuration is 0(n~ £ ). 

Suppose that V is a good configuration but G is not connected. Then 
there exists a component F with diameter at most d \/log n not containing 
any vertex within 2c'i/log n of the boundary of S n . Let A be the collection 
of tiles that contain a point of F. Since the configuration is good A is a 
connected subset of G containing no tile within d \/log n of the boundary 
of S n . Moreover, the bound on the diameter of F implies that A contains 
at most 16(dM) 2 tiles. 

The heart of the proof is in the following lemma that bounds the prob- 
ability of G having such a component. 

Lemma 7. Suppose A is a connected subset of G containing no tile within 
c'ydog n of the boundary of S n . The probability that the configuration is good 
and that G has a component contained entirely inside A meeting every tile 
of A is at most 0(11. 3" fc ). 

Proof. Suppose that F is a component of G meeting every tile in A. 

The proof of this lemma naturally divides into three steps. In the first 
step we define some regions based on the component F some of which must 
contain many points and some which must be empty. In the second step we 
bound the area of these regions. In the final step we bound the probability 
that these regions do indeed contain the required number of points. 

Step 1: Defining the regions. We use the following hexagonal construc- 
tion which was introduced by Balister, Bollobas, Sarkar and Walters in [2]. 
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Figure 1: The circumscribed hexagon H and associated regions. 



Let H be the circumscribed hexagon of the points of F obtained by taking 
the six tangents to the convex hull of F at angles and ±60° to the horizon- 
tal, and let Hi, . . . , Hq be the regions bounded by the exterior angle bisectors 
of H as in Figure [TJ Let Pi,...,P% be the points of F on these tangents, 
and let Di, . . . , Dq denote the k- nearest neighbour disks of Pi, . . . , Pq. For 
1 < i < 6 let Ai = Di n Hi. Let Aq be the set DiDH with the smallest area. 
We see that for each 1 < i < 6 the set Ai contains no points of V . Also Aq 
contains k + 1 points all of which must be in F and thus in A. Writing A' 
for the set Aq n A, we see that A' contains at least k + 1 points of V . 

We also wish to take account of points near to but not contained in F . 
Let P £ F and Q 6 G \ F be vertices minimising the distance between F 
and G\F. Let tq = d(P,Q) and r = tq — y/2s. Since, we are assuming 
that every square of A contains a point in F we see that Ar r \ \ A contains 
no point of V . Indeed, suppose there is a point of V in Ar T \ \ A. Then this 
point is in G \ F and is within ro of some point of F which contradicts the 
definition of ro. 

Obviously the points Q and P are not joined so, in particular, the k 
points nearest to Q must all be nearer to Q than P is. Moreover, since Q is 
the point closest to F, we see that these k points must all be further away 
from P than Q is. Combining these we see that these k points lie in in the 
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set B = D{Q,r )\D(P,r ). 

Summarising all of the above, we see that A' and B each contain at least 
k points and Ai r \ \ A and (Ji=i A« are both empty. The intersection A' n B 
contains no points (so we can think of them as disjoint) but Ar r \ and Ui=i At 
will overlap significantly. Thus we will use Lemma [3] to form two separate 
bounds, one based on Ar r ^ \ A being empty and one based on Ui=i Aj being 
empty. 



Step 2: Bounding the area of the regions. In this step we assume that the 
configuration is good. 

First we bound | Ui=i At|- Since the configuration is good each disc Di 
has radius at most c'-y/Iogn and each point Pi is more than 2c'\/logn from 
the boundary of S n . In particular Di is contained in S n for each i. Moreover, 
since \Di n Hi\ > \Di n iJ| for each 1 < i < 6, we see that |Aj| > \Aq\. Since 
the Hi and therefore the Ai are disjoint, we have 

6 

| (J AI > 6|A | > 6\A'\. 
i=i 

The sets B and A( r ) both depend on r so it is convenient to write r in 
terms of \A'\ by letting x = r/ (^\A'\/tt). 

Since B = D(Q,ro) \ D(P,ro) a simple calculation shows that \B\ = 

f 2 ) r o- Since the configuration is good, ro > 20000s so 



V2s > r (l - 10" 4 ). 



Hence, 

151 = I 




Finally we bound A( r ). Let D and -D' be balls of area |A| and |A'| respec- 
tively. Since the configuration is good the the half-disc of radius c'yTogn 
about the right-most point of F must contain a point of V . In particular 
r < ro < c'ydog n, and so Ar r ) is contained in S n . By the isoperimetric 
inequality in the plane 

\A (r) \A\ > \D [r) \D\, 

and it easy to see that \D( r ) \D\ > \D'r r \ \D'\. Since D' is a ball of radius 
vIA'I/tt, D' (r) is a ball of radius ^\A'\/n + r = {l + x)y/\A'\/ir, and we have 

\D[ r) \D'\ = ((x + lf-l)\A'\. 
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Step 3: Bounding the probability of such a configuration. We have seen 
that if there is such a component F then there exist regions as defined in 
Step 1. These regions are determined by 14 points: the six points defining 
sides of the hexagonal hull, their six k th nearest neighbour points and the 
points P and Q; that is, if there is such a component F then there are 
14 points of V defining regions A', B, A%, . . . , A% and Ar r ) with JfA' > k, 

#B > k, n B) = 0, and both # (Ji=i A i = and #( A (r) \ A) = 0. 

Moreover, if the configuration is good all of these points must lie within 
dy/logn of A. 

Let Z be the event that there are 14 points of V all within c'y/log n of 
A defining regions with the above properties. We have 

P(there exists F and the configuration is good) 

< P(Z and the configuration is good) 

< P(Z). 

We bound the probability that Z occurs (note we are not assuming that 
the configuration is good). Fix a particular collection of 14 points of V and 
let Z' be the event that these particular points witness Z. Note, since we 
are assuming these 14 points all lie with dy/\og n of A, the corresponding 
regions all lie entirely within S n . 

We apply Lemma [6] to the sets A' , B together with each of |Ji=i A% and 
A (r) \ A. 

First we form the bound based on $ Ui=i ^« = 0- We have \A'\ < 
| ULi M and ' provided x < 3.13, we have \B\ < 0.61x 2 < 6\A'\ < | ULi M 
so Lemma [6] applies. Thus we see that 

4\A'\\B\ \ k f 4-0.61x 2 \ k 

(l^l + l(Uti^)l + l^l)V ~ V (7 + 0.61.x 2 ) 2 J 

Secondly we form a bound based on j£(Ai r \ \ A) = 0. This time \B\ < 
0.61x 2 |^'| < ((x + l) 2 - 1)\A'\ < \A {r) \ A\ and, provided that x > y/2 - 1, 
we have \ A^ \A\> \A'\ so the conditions of Lemma [6] are satisfied. Thus 

A\A'\\B\ \ k ( 4-0.61x 2 \ k 

(\A>\ + \A (r) \A\ + \B\)y ~ V((x + l) 2 + 0.61x 2 ) 2 J ' 

It is easy to check that the maximum of the minimum of these two 
bounds occurs when they are equal, i.e., when x = y/7 — 1; at this point 
they are a~ k for some a > 11.3. Therefore F(Z') < a~ k . 



P(Z') < 



\Z') < 
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Since all 14 points must lie within d \/log n of A there are 0((logn) 14 ) 
ways of choosing them. Hence, the expected number of 14 point sets for 
which Z' occurs is is 0((log n) u a~ k ) = 0(11.3- fc ). Thus P(Z) = 0(11.3~ fe ) 
and the proof of the lemma is complete. □ 

Since the degree of vertices in G is bounded and 16(c'M) 2 is a (large) 
constant, there are only a constant number of connected sets of G of size 
at most 16(c'M) 2 which contain a fixed tile, and therefore 0{n) such sets 
in total. Since k > 0.4125 log n > 3 log n the expected number of small 
components in G with the configuration good is 0(n(11.3) ) = o(l). Thus 

P(G is not connected) 

< P(there is a small component and V is good) + ¥(V is bad) 
= o(l) +0{n~ £ ) 
= o(l), 

so whp G is connected. □ 



4 Proof of Theorem Q] 

Much of this is the same as the proof of Theorem [2] so we shall concen- 
trate on the differences. This time, by hypothesis we have k > 0.272 log n 
and again we may assume k < 0.6 log n. We use exactly the same tesse- 
lation of S n with small squares of side length s = \/log n/M where c' = 
max{ci(0.25, l),c 2 (0.25, 1)} and M = 20000c' are given by Lemmas [3] and [5] 
as before. Again we form a graph G on these tiles by joining two tiles 
whenever the distance between their centres is at most 2c'v / k>g n. 

We need a slightly different definition of a bad pointset: the first four 
conditions are exactly as before but we replace the fifth condtion by 

5. there exists a component of diameter at most c'y/log n containing a 
vertex within distance 3c'\/Togn of two sides of S n . 

Note that this condition, together with Condition 4 on the diameter of small 
components, implies that for any small component at most one side of S n 
can have points of this small component within distance 2c / y / Iogn of it. 

Since the tesselation is the same as in the proof of Theorem [2] we see 
that the probability that any of the original four conditions hold is 0(n~ 1 ) 
as before. Since k > 0.272 log n Lemma H] implies that the probability of the 
new condition above is 0(n~ £ ) for some < e < 1. Combining these we see 
that the probability of a bad configuration is 0(n~ £ ). 
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Figure 2: The circumscribing set H and associated regions. 

Suppose that V is a good configuration but not all points within log n of 
the boundary of S n are contained in the giant component. Then there exists 
a component F with diameter at most d \/log n containing a vertex within 
logro of the boundary of S n . Let A be the collection of tiles that contain 
a point of F. Since the configuration is good A is a connected subset of G 
and, as before, the bound on the diameter of F implies that A contains at 
most 16(c'M) 2 tiles. This time at most one side of S n has any tiles of A 
within c / y / logn of it. 

The following lemma, which is similar to Lemma [7] bounds the probabil- 
ity of such a small component. 

Lemma 8. Suppose A is a connected subset of G such that at most one 
side of S n has any tiles in A within c'ydog n of it. The probability that the 
configuration is good and that G has a small component contained entirely 
inside A which meets every square of A is at most (6.3) _fc . 

Remark. Obviously this lemma is only of interest for sets A near the bound- 
ary, since otherwise Lemma [7] is stronger. 

Proof. The proof divides into the same three steps as Lemma [7j 

Step 1: Defining the regions. As before suppose that F is a component of 
G meeting every tile in A. Let E be the (almost surely unique) side of S n 
closest to F. 

This time let H be the region bounded by the four interior sides of the 
circumscribed hexagon of the points of F obtained by taking four of the 
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tangents to the convex hull of F at angles 90° and ±30° to E, together 
with E as in Figure [2j Let H \ , . . . , H4 be the regions bounded by the 
exterior angle bisectors of H and E. Let Pi, . . . , P4 be the points of F on 
these tangents, and let D±, . . . , D4 denote the /s-nearest neighbour disks of 
Pi, . . . , P 4 . For 1 < i < 4 let A t = A n A. Let A be the set D t n with 
the smallest area and write .A' for the set AqH A. Exactly as before we see 
that for 1 < i < 4 the set Ai is empty, and that A' must contain at least 
k + 1 points of "P. 

As before let P G P and Q € G \ P be vertices minimising the distance 
between F and G\F, tq = d(P, Q) and r = ro — gain, since F meets 

every tile of A we see that Ai r \ \ A must be empty. Also, as before, the set 
B = (D(Q,ro) \ D(P,ro)) n S n must contain at least k points. 

Step 2: Bounding the area of the regions. In this step we assume the con- 
figuration is good. 

First we bound | (Ji=i Similarly to before we see that each disc Z)j 
has radius at most c'^/log n so meets no side of S n apart from possibly E. 
Thus, we have \D{ D Hi\ > \D{ n H\ for each 1 < i < 4, so we see that 
I Ai| > |Ao|. As before the Hi and therefore the Ai are disjoint so 

4 

\\jAi\ >4|A | >4|A'|. 

i=l 

As before let x = rj \J\ A'\/tt and exactly as in the proof of Lemma [7] we 
have \B\ < 0.61x 2 |A'|. 

Finally we bound A( r \ \ A. Consider the point of F furthest from E and 
the half disc of radius c'-y/log n about that point facing away from E. Since 
no point of F is within dy/log n of any side of S n apart from E, this half disc 
is entirely inside S n , and so must contain a point of V (which is obviously not 
in F). Therefore, as before, r < ro < d \/log n. Thus Ar r \ n S n = Ar r -\ n E + 
where E + denotes the halfplane bounded by E that contains S n . 

This time let D and D' be half discs of area |A| and |A'| respectively 
centred on E. Then, by the isoperimetric inequality in the half plane E + 
(an easy consequence of the same inequality in the whole plane), 

|(A (r) n E+) \ A\ > \(D [r) n E+) \ D\ > \(D[ r) n E + ) \ D'\. 

Now D' is half a disc of radius ^/2^\A'\/tt and Di, D E + is half a disc of 
radius \/2^\A'\/tt + r = (1 + x/y/2)^2\A'\/ir, so this time we we have 

\(D'nE + )\D'\ = ((l + x/V2) 2 -l)\A'\. 
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Step 3: Bounding the probability of such a configuration. We have seen that 
if there is such a component F then there exist regions as defined above. 
These regions are determined by 10 points: the four points defining sides of 
the hexagonal hull, their four k th nearest neighbour points and the points 
P and Q; that is, if there is such a component F then there are 10 points 
of V defining regions A', B, A\, . .. , A* and Ai r \ with #A' > k, j^B > k, 
#(A' flB) = 0, and both # U-=i A = and #((A (r) n S n ) \ A) = 0. Again, 
if the configuration is good, all these points must lie within c'y/logn of A. 

Similarly to before, let Z be the event that there are 10 points of V all 
within c'yiogn of A defining regions with the above properties. Again 

P(there exists F and the configuration is good) 

< P(Z and the configuration is good) 

< P(Z) 

so, as before, we bound ¥(Z). 

Fix a particular collection of 10 points and let Z' be the event that these 
10 points witness Z. Note, since we are assuming these 10 points all lie 
with c'i/log?i of A, the regions A' , A±, . . . , A4 all lie entirely within S n . By 
definition, B and (Ar r -\ n S n ) \ A also lie in S n . 

Again we apply Lemma[6]to the sets A' , B together with each of IJiLi A 
and (A( r ) nS n ) \ A. This time, however, neither bound will be valid for large 
x so we form a third bound based just on the two sets A' and (A^nS n )\A. 

As before we base the first bound on #Ui=i A = 0- We have \A'\ < 
\{Ji =1 Ai\ and, provided x < 2.56, we have \B\ < 0.61x 2 |A'| < 4\A'\ < 
I Ui=i A I so Lemma [6] implies 

4\A'\\B\ V f 4-0.61x 2 \ k 

(|A| + |(UtiA)| + |i?|)V " 1(5 + 0.61.x 2 ) 2 J 

The second bound based on #((A( r ) n S n ) \ A) = is also very similar 
to before. However, this time the middle inequality in 

|5| < 0.61x 2 |A| < ((1 + x/V2) 2 - 1)\A'\ < \(A (r) n S n ) \ A\ 

is not valid for all x, but it is valid for all x < 12. Also provided that 
x > 2 - y/2, we have |(A (r) n 5„) \ A| > \A'\ so for 2 - < x < 12 the 
conditions of Lemma [6] are satisfied. Thus 

4|A'||£| \ h ( 4-0.61x 2 \ 

(\A'\ + \(A [r) nS n )\A\ + \B\)i) ~ ^((l + x/v^P + o.eix 2 ) 2 ; 
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P(Z') < 



P(Z') < 



Since neither bound applies for large x we form a third bound based on 
the two sets A' and (^4( r ) n S n ) \ A. We know A' contains at least k points 
and (A( r ^ n S n ) \A is empty. This has probability at most 



A'\ 



A'\ + \(A {r) nS n )\A\J - (l + x /V2) 2k 

which is less than 80 _fc for all x > 12. 

As before the maximum of the minimum of the first two bounds occurs 
when they are equal at x = \^2(\^5 — 1); at this point they are oT k for some 
a > 6.3. Moreover the third bound is tiny in comparison. Thus, in all cases, 
P(Z') < a~ k for some a > 6.3. 

Since all 10 points must lie within d \/log n of A there are 0((logn) 10 ) 
ways of choosing them. Hence, similarly to before, the expected number of 
10 point sets for which Z' occurs is is 0((log n) 10 a" k ) = 0(6.3~ fc ). Hence 
F(Z) = 0(6.3~ k ) and the proof of the lemma is complete. □ 

The remainder of the proof is very similar to before. There are only 
a constant number of connected sets of G of size at most 16(d M) 2 which 
contain a fixed tile, and therefore 0{\Jn log n) such sets which contain a 
tile within distance logn of the boundary of S n . Since k > 0.272 log n > 
iog~6 3 ^°&(V™) f° r some e' > the expected number of small components 
of G that contain a vertex within distance logn of the boundary of S n 
when the configuration is good is 0(y/n logn(6.3) _fc ) = o(n~ £ I 2 ). Let e = 
min(e / /2, 1) and p be the the probability that there exists a point V within 
logn of the boundary of S n that is not in the giant component. Then 

p < P(there is a small boundary component and V is good) + ¥(V is bad) 
= o(n- £ ) + 0(n- £ ) 
= 0(n~ £ ) 

as claimed. □ 



Open Questions 

In this paper we have proved two results about the behaviour of the small 
components in the graph G n ^. However, several question about their prop- 
erties remain open. We are interested in the behaviour near the connectivity 
threshold so, in particular, we assume in the following questions that k is at 
least 0.3 logn. 
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Question 1. Must the small components ofG n ^ be isolated'? More precisely, 
is it the case that, whp, there do not exist two small components within 
distance of 0(\/Iog n) of each other. 

Since the first draft of this paper Falgas-Ravry [4] has answered this 
question in the affirmative provided that the probability that G is connected 
is not too small: more precisely he proves it whenever P(G is connected) = 
f2(n 7 ) (where 7 is an absolute constant). 

Question 2. How many vertices do small components contain? 

It is immediate from Lemma 6 of [2] (quoted as Lemma [3] of this paper) that 
all small components contain 0{k) vertices. If the lower bound construction 
of Balister, Bollobas, Sarkar and Walters in [2] is extremal then, as the 
authors remark there, all small components would contain k + 0(l) vertices. 

Question 3. Are all the small components convex in the sense that all points 
of V within the convex hull of a small component are actually part of the 
small component? 
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